54 research outputs found

    Annotation und Klassifikation nuklearer DomÀnen

    Get PDF
    Im Rahmen dieser Doktorarbeit wurde der Großteil aller nuklearen Proteine annotiert und klassifiziert. Aus Literatur, Proteinsequenz- und DomĂ€nendatenbanken wurden bekannte nukleare DomĂ€nen ermittelt, ihre Grenzen unter Zuhilfenahme von TertiĂ€rstrukturen oder SekundĂ€rstrukturvorhersagen bestimmt und multiple Sequenzalignments erstellt. Die handgerfertigten Aligments wurden zur Anfertigung von Hidden Markov Models herangezogen und in das DomĂ€nenvorhersageprogramm Simple Modular Architecture Research Tool (Schultz et al. 1998, Schultz et al. 2000) (http://smart.embl-heidelberg.de/) implementiert. Hier sind umfassend Informationen ĂŒber Literatur, phylogentische Verteilung, Anzahl beteiligter Proteine und Funktion fĂŒr 164 DomĂ€nen (118 entstammen dieser Arbeit) mehr als 35000 Proteine abdeckend zusammengefasst. Aufbauend auf der vollstĂ€ndigen Kollektion nuklearer Proteine wurden ausgewĂ€hlte nukleare und nicht-nukleare Proteine auf der Grundlage homologiebasierender Sequenzanalyseverfahren untersucht. Die Arbeit fĂŒhrte zur Entdeckung von vier biologisch relevanten neuen DomĂ€nen: - L27, eine neue Hetero-Dimer bildende DomĂ€ne in den Rezeptor-Targeting-Proteins Lin-2 and Lin-7 (Doerks et al. 2000) - GRAM, eine neue DomĂ€ne in Glucosyltransferasen, Myotubularinen und anderen Membran-assoziierten Proteinen (Doerks et al. 2000) - DDT, eine neue DNA-bindende DomĂ€ne in unterschiedlichen Transkriptionsfaktoren, Chromosom-assoziierten und anderen nuklearen Proteinen (Doerks et al. 2001) - BSD, eine neue putativ DNA-bindende DomĂ€ne in Transkriptionsfaktoren, Synapsen-assozierten und anderen hypothetischen Proteinen (Doerks et al. submitted) Abschliessend erfolgte die automatische Analyse von 24000 nuklearen Proteinen, aus denen 550 hypothetisch neue DomĂ€nen hervorgingen. Die intensive Aufarbeitung dieser 550 konservierten Sequenzbereiche erbrachte die Entdeckung von 28 neuen nuklearen oder teilweise nuklearen DomĂ€nen unterschiedlicher Speziesverbreitung, Funktion und biologischer Relevanz (Doerks et al. accepted)

    Methodenentwicklung und Anwendungsbeispiele fĂŒr Protein- und Nukleotidsequenzanalysen

    Get PDF
    Die Bioinformatik dient der Lösung biologischer Probleme und Erkenntnisgewinnung mit Hilfe informatischer Methodik. und stellt das Bindeglied zwischen der Informationswissenschaft und der Lehre des Lebens dar. Eine festgeschriebene Definition des Begriffes „Bioinformatik" existiert nicht: Sie umfaßt ein weites Feld, beginnend bei der automatischen Sequenzierung ganzer Genome, ĂŒber Funktionsanalysen durch Homologiesuchen in Datenbanken, Strukturvorhersagen und Modelling, bis hin zur chipgesteuerten Prothetik. UnterstĂŒtzende Arbeit in der Molekularbiologie leistet die Bioinformatik bei der Aufnahme und Verwaltung von Nukleotid– und AminosĂ€uresequenzen in Datenbanken. Der rasante Fortschritt bei der Sequenzierung ganzer Genome fĂŒhrt zu explosionsartig ansteigender DatenfĂŒlle und damit zu stetig wachsenden bioinformatischen Anwendungsmöglichkeiten, die ihrerseits notwendig sind, um diese DatenfĂŒlle zu bewĂ€ltigen. Die Genomprojekte erbrachten bislang die vollstĂ€ndige Sequenz der Genome von Species aus allen drei Überreichen: Eubakterien: Haemophilus influenza [17], Mycoplasma genitalium [18], Synechocystis sp [29]. u.a. Archaebakterien: Methanococcus jannaschii [16] u.a. Eukaryonten: Saccharomyces cerevisiae u.a. Ende 1998 soll auch die Sequenzierung des Nemathelminten (Rundwurm) Caenorhabditis elegans und den Menschen abgeschlossen sein................ Im Rahmen dieser Diplomarbeit sollte ein Programm entwickelt werden, das die intrinsischen Eigenschaften eines Proteins vorhersagt. Es soll die Ausgabe der Ergebnisse von drei Programmen (Coils2, TopPred2 und SignalP) analysieren und interpretieren, eine Prognose ĂŒber die PrĂ€senz von Coiled Coils, Transmembranregionen und Signalpeptiden erstellen und die betroffenen Bereiche der AminosĂ€uresequenz angeben. Die Nutzung dieses Hilfsmittels ist ĂŒber das World-Wide-Web möglich. In einem weiteren Teil soll die Funktion von Proteinen, deren funktionelle Eigenschaften unbekannt sind, ĂŒber Homologiesuchen und Deutung der Ähnlichkeiten zu Proteinen mit bekannter Funktion aufgeklĂ€rt und beschrieben werden. Es handelt sich hierbei um Proteine, die aufgrund von SequenzĂ€hnlichkeit in 58 Familien, sogenannten UPFs (uncharacterized protein families), zusammengefaßt sind

    DCD – a novel plant specific domain in proteins involved in development and programmed cell death

    Get PDF
    BACKGROUND: Recognition of microbial pathogens by plants triggers the hypersensitive reaction, a common form of programmed cell death in plants. These dying cells generate signals that activate the plant immune system and alarm the neighboring cells as well as the whole plant to activate defense responses to limit the spread of the pathogen. The molecular mechanisms behind the hypersensitive reaction are largely unknown except for the recognition process of pathogens. We delineate the NRP-gene in soybean, which is specifically induced during this programmed cell death and contains a novel protein domain, which is commonly found in different plant proteins. RESULTS: The sequence analysis of the protein, encoded by the NRP-gene from soybean, led to the identification of a novel domain, which we named DCD, because it is found in plant proteins involved in development and cell death. The domain is shared by several proteins in the Arabidopsis and the rice genomes, which otherwise show a different protein architecture. Biological studies indicate a role of these proteins in phytohormone response, embryo development and programmed cell by pathogens or ozone. CONCLUSION: It is tempting to speculate, that the DCD domain mediates signaling in plant development and programmed cell death and could thus be used to identify interacting proteins to gain further molecular insights into these processes

    A computational screen for type I polyketide synthases in metagenomics shotgun data

    Get PDF
    BACKGROUND: Polyketides are a diverse group of biotechnologically important secondary metabolites that are produced by multi domain enzymes called polyketide synthases (PKS). METHODOLOGY/PRINCIPAL FINDINGS: We have estimated frequencies of type I PKS (PKS I) – a PKS subgroup – in natural environments by using Hidden-Markov-Models of eight domains to screen predicted proteins from six metagenomic shotgun data sets. As the complex PKS I have similarities to other multi-domain enzymes (like those for the fatty acid biosynthesis) we increased the reliability and resolution of the dataset by maximum-likelihood trees. The combined information of these trees was then used to discriminate true PKS I domains from evolutionary related but functionally different ones. We were able to identify numerous novel PKS I proteins, the highest density of which was found in Minnesota farm soil with 136 proteins out of 183,536 predicted genes. We also applied the protocol to UniRef database to improve the annotation of proteins with so far unknown function and identified some new instances of horizontal gene transfer. CONCLUSIONS/SIGNIFICANCE: The screening approach proved powerful in identifying PKS I sequences in large sequence data sets and is applicable to many other protein families

    Annotation of the M. tuberculosis Hypothetical Orfeome: Adding Functional Information to More than Half of the Uncharacterized Proteins

    Get PDF
    The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as ‘hypothetical’ implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism

    Universally distributed single-copy genes indicate a constant rate of horizontal transfer

    Get PDF
    Single copy genes, universally distributed across the three domains of life and encoding mostly ancient parts of the translation machinery, are thought to be only rarely subjected to horizontal gene transfer (HGT). Indeed it has been proposed to have occurred in only a few genes and implies a rare, probably not advantageous event in which an ortholog displaces the original gene and has to function in a foreign context (orthologous gene displacement, OGD). Here, we have utilised an automatic method to identify HGT based on a conservative statistical approach capable of robustly assigning both donors and acceptors. Applied to 40 universally single copy genes we found that as many as 68 HGTs (implying OGDs) have occurred in these genes with a rate of 1.7 per family since the last universal common ancestor (LUCA). We examined a number of factors that have been claimed to be fundamental to HGT in general and tested their validity in the subset of universally distributed single copy genes. We found that differing functional constraints impact rates of OGD and the more evolutionarily distant the donor and acceptor, the less likely an OGD is to occur. Furthermore, species with larger genomes are more likely to be subjected to OGD. Most importantly, regardless of the trends above, the number of OGDs increases linearly with time, indicating a neutral, constant rate. This suggests that levels of HGT above this rate may be indicative of positively selected transfers that may allow niche adaptation or bestow other benefits to the recipient organism

    Identifying single copy orthologs in Metazoa

    Get PDF
    The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene. Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence. To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived

    Toward automatic reconstruction of a highly resolved tree of life

    Get PDF
    Contains fulltext : 51078.pdf (publisher's version ) (Closed access)We have developed an automatable procedure for reconstructing the tree of life with branch lengths comparable across all three domains. The tree has its basis in a concatenation of 31 orthologs occurring in 191 species with sequenced genomes. It revealed interdomain discrepancies in taxonomic classification. Systematic detection and subsequent exclusion of products of horizontal gene transfer increased phylogenetic resolution, allowing us to confirm accepted relationships and resolve disputed and preliminary classifications. For example, we place the phylum Acidobacteria as a sister group of delta-Proteobacteria, support a Gram-positive origin of Bacteria, and suggest a thermophilic last universal common ancestor

    Systematic Association of Genes to Phenotypes by Genome and Literature Mining

    Get PDF
    One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases

    The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored

    Get PDF
    An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.or
    • 

    corecore